Goto

Collaborating Authors

 th agent



Inferring Short-Sightedness in Dynamic Noncooperative Games

arXiv.org Artificial Intelligence

Dynamic game theory is an increasingly popular tool for modeling multi-agent, e.g. human-robot, interactions. Game-theoretic models presume that each agent wishes to minimize a private cost function that depends on others' actions. These games typically evolve over a fixed time horizon, which specifies the degree to which all agents care about the distant future. In practical settings, however, decision-makers may vary in their degree of short-sightedness. We conjecture that quantifying and estimating each agent's short-sightedness from online data will enable safer and more efficient interactions with other agents. To this end, we frame this inference problem as an inverse dynamic game. We consider a specific parametrization of each agent's objective function that smoothly interpolates myopic and farsighted planning. Games of this form are readily transformed into parametric mixed complementarity problems; we exploit the directional differentiability of solutions to these problems with respect to their hidden parameters in order to solve for agents' short-sightedness. We conduct several experiments simulating human behavior at a real-world crosswalk. The results of these experiments clearly demonstrate that by explicitly inferring agents' short-sightedness, we can recover more accurate game-theoretic models, which ultimately allow us to make better predictions of agents' behavior. Specifically, our results show up to a 30% more accurate prediction of myopic behavior compared to the baseline.


Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks

arXiv.org Artificial Intelligence

We address the challenge of sampling and remote estimation for autoregressive Markovian processes in a multi-hop wireless network with statistically-identical agents. Agents cache the most recent samples from others and communicate over wireless collision channels governed by an underlying graph topology. Our goal is to minimize time-average estimation error and/or age of information with decentralized scalable sampling and transmission policies, considering both oblivious (where decision-making is independent of the physical processes) and non-oblivious policies (where decision-making depends on physical processes). We prove that in oblivious policies, minimizing estimation error is equivalent to minimizing the age of information. The complexity of the problem, especially the multi-dimensional action spaces and arbitrary network topologies, makes theoretical methods for finding optimal transmission policies intractable. We optimize the policies using a graphical multi-agent reinforcement learning framework, where each agent employs a permutation-equivariant graph neural network architecture. Theoretically, we prove that our proposed framework exhibits desirable transferability properties, allowing transmission policies trained on small- or moderate-size networks to be executed effectively on large-scale topologies. Numerical experiments demonstrate that (i) Our proposed framework outperforms state-of-the-art baselines; (ii) The trained policies are transferable to larger networks, and their performance gains increase with the number of agents; (iii) The training procedure withstands non-stationarity even if we utilize independent learning techniques; and, (iv) Recurrence is pivotal in both independent learning and centralized training and decentralized execution, and improves the resilience to non-stationarity in independent learning.


Causal Coordinated Concurrent Reinforcement Learning

arXiv.org Artificial Intelligence

In this work, we propose a novel algorithmic framework for data sharing and coordinated exploration for the purpose of learning more data-efficient and better performing policies under a concurrent reinforcement learning (CRL) setting. In contrast to other work which make the assumption that all agents act under identical environments, we relax this restriction and instead consider the formulation where each agent acts within an environment which shares a global structure but also exhibits individual variations. Our algorithm leverages a causal inference algorithm in the form of Additive Noise Model - Mixture Model (ANM-MM) in extracting model parameters governing individual differentials via independence enforcement. We propose a new data sharing scheme based on a similarity measure of the extracted model parameters and demonstrate superior learning speeds on a set of autoregressive, pendulum and cart-pole swing-up tasks and finally, we show the effectiveness of diverse action selection between common agents under a sparse reward setting. To the best of our knowledge, this is the first work in considering non-identical environments in CRL and one of the few works which seek to integrate causal inference with reinforcement learning (RL).


Inferring Occluded Agent Behavior in Dynamic Games with Noise-Corrupted Observations

arXiv.org Artificial Intelligence

Robots and autonomous vehicles must rely on sensor observations, e.g., from lidars and cameras, to comprehend their environment and provide safe, efficient services. In multi-agent scenarios, they must additionally account for other agents' intrinsic motivations, which ultimately determine the observed and future behaviors. Dynamic game theory provides a theoretical framework for modeling the behavior of agents with different objectives who interact with each other over time. Previous works employing dynamic game theory often overlook occluded agents, which can lead to risky navigation decisions. To tackle this issue, this paper presents an inverse dynamic game technique which optimizes the game model itself to infer unobserved, occluded agents' behavior that best explains the observations of visible agents. Our framework concurrently predicts agents' future behavior based on the reconstructed game model. Furthermore, we introduce and apply a novel receding horizon planning pipeline in several simulated scenarios. Results demonstrate that our approach offers 1) robust estimation of agents' objectives and 2) precise trajectory predictions for both visible and occluded agents from observations of only visible agents. Experimental findings also indicate that our planning pipeline leads to safer navigation decisions compared to existing baseline methods.


Sim-and-Real Reinforcement Learning for Manipulation: A Consensus-based Approach

arXiv.org Artificial Intelligence

Sim-and-real training is a promising alternative to sim-to-real training for robot manipulations. However, the current sim-and-real training is neither efficient, i.e., slow convergence to the optimal policy, nor effective, i.e., sizeable real-world robot data. Given limited time and hardware budgets, the performance of sim-and-real training is not satisfactory. In this paper, we propose a Consensus-based Sim-And-Real deep reinforcement learning algorithm (CSAR) for manipulator pick-and-place tasks, which shows comparable performance in both sim-and-real worlds. In this algorithm, we train the agents in simulators and the real world to get the optimal policies for both sim-and-real worlds. We found two interesting phenomenons: (1) Best policy in simulation is not the best for sim-and-real training. (2) The more simulation agents, the better sim-and-real training. The experimental video is available at: https://youtu.be/mcHJtNIsTEQ.


No Bidding, No Regret: Pairwise-Feedback Mechanisms for Digital Goods and Data Auctions

arXiv.org Artificial Intelligence

The growing demand for data and AI-generated digital goods, such as personalized written content and artwork, necessitates effective pricing and feedback mechanisms that account for uncertain utility and costly production. Motivated by these developments, this study presents a novel mechanism design addressing a general repeated-auction setting where the utility derived from a sold good is revealed post-sale. The mechanism's novelty lies in using pairwise comparisons for eliciting information from the bidder, arguably easier for humans than assigning a numerical value. Our mechanism chooses allocations using an epsilon-greedy strategy and relies on pairwise comparisons between realized utility from allocated goods and an arbitrary value, avoiding the learning-to-bid problem explored in previous work. We prove this mechanism to be asymptotically truthful, individually rational, and welfare and revenue maximizing. The mechanism's relevance is broad, applying to any setting with made-to-order goods of variable quality. Experimental results on multi-label toxicity annotation data, an example of negative utilities, highlight how our proposed mechanism could enhance social welfare in data auctions. Overall, our focus on human factors contributes to the development of more human-aware and efficient mechanism design.


Distributed Optimal Formation Control for an Uncertain Multiagent System in the Plane

arXiv.org Artificial Intelligence

In this paper, we present a distributed optimal multiagent control scheme for quadrotor formation tracking under localization errors. Our control architecture is based on a leader-follower approach, where a single leader quadrotor tracks a desired trajectory while the followers maintain their relative positions in a triangular formation. We begin by modeling the quadrotors as particles in the YZ-plane evolving under dynamics with uncertain state information. Next, by formulating the formation tracking task as an optimization problem -- with a constraint-augmented Lagrangian subject to dynamic constraints -- we solve for the control law that leads to an optimal solution in the control and trajectory error cost-minimizing sense. Results from numerical simulations show that for the planar quadrotor model considered -- with uncertainty in sensor measurements modeled as Gaussian noise -- the resulting optimal control is able to drive each agent to achieve the desired global objective: leader trajectory tracking with formation maintenance. Finally, we evaluate the performance of the control law using the tracking and formation errors of the multiagent system.


Decentralized Distributed Expert Assisted Learning (D2EAL) approach for cooperative target-tracking

arXiv.org Artificial Intelligence

This paper addresses the problem of cooperative target tracking using a heterogeneous multi-robot system, where the robots are communicating over a dynamic communication network, and heterogeneity is in terms of different types of sensors and prediction algorithms installed in the robots. The problem is cast into a distributed learning framework, where robots are considered as 'agents' connected over a dynamic communication network. Their prediction algorithms are considered as 'experts' giving their look-ahead predictions of the target's trajectory. In this paper, a novel Decentralized Distributed Expert-Assisted Learning (D2EAL) algorithm is proposed, which improves the overall tracking performance by enabling each robot to improve its look-ahead prediction of the target's trajectory by its information sharing, and running a weighted information fusion process combined with online learning of weights based on a prediction loss metric. Theoretical analysis of D2EAL is carried out, which involves the analysis of worst-case bounds on cumulative prediction loss, and weights convergence analysis. Simulation studies show that in adverse scenarios involving large dynamic bias or drift in the expert predictions, D2EAL outperforms well-known covariance-based estimate/prediction fusion methods, both in terms of prediction performance and scalability.


Fragile object transportation by a multi-robot system in an unknown environment using a semi-decentralized control approach

arXiv.org Artificial Intelligence

In this paper, we introduce a semi-decentralized control technique for a swarm of robots transporting a fragile object to a destination in an uncertain occluded environment.The proposed approach has been split into two parts. The initial part (Phase 1) includes a centralized control strategy for creating a specific formation among the agents so that the object to be transported, can be positioned properly on the top of the system. We present a novel triangle packing scheme fused with a circular region-based shape control method for creating a rigid configuration among the robots. In the later part (Phase 2), the swarm system is required to convey the object to the destination in a decentralized way employing the region based shape control approach. The simulation result as well as the comparison study demonstrates the effectiveness of our proposed scheme.